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Abstract 

Recent studies have shown that real-valued principal component analysis can be applied 
to earthquake fault systems for forecasting and prediction. In addition, theoretical analysis 
indicates that earthquake stresses may obey a wave-like equation, having solutions with 
inverse frequencies for a given fault similar to those that characterize the time intervals 
between the largest events on the fault. It is therefore desirable to apply complex principal 
component analysis to develop earthquake forecast algorithms. In this paper we modify the 
Pattern Informatics method of earthquake forecasting to take advantage of the wave-like 
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properties of seismic stresses and utilize the Hilbert transform to create complex eigenvec- 
tors out of measured time series. We show that Pattern Informatics analyses using com- 
plex eigenvectors create short-term forecast hot-spot maps that differ from hot-spot maps 
created using only real-valued data and suggest methods of analyzing the differences and 
calculating the information gain. 
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1 Introduction 



Principal component analysis (PCA) is a mathematical procedure that transforms a 
set of correlated variables into a smaller set of uncorrelated variables called prin- 
cipal components. The first principal component accounts for as much of the vari- 
ability in the data as possible, and each succ eeding component attempts to account 



for the remaining variability. 



Savage (1988) introduced PCA to the seismic com- 



munity by using it to decompose time series data into a complete set of orthonormal 
subspaces that isolate spatial and temporal eigensources. 



Complex principal component analysis is an extension of classical principal com- 
ponent analysis in which the spatial basis vectors represent the eigenfunctions of 
a complex correlation matrix. It is closely related to principal oscillation pattern 



(POP) analysis, in which the oscillat ing basis 



of a deterministic feedback matrix (Penland, 



pattern states are the eigenfunctions 



1989) (both techniques empirically 
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identify time-dependent spatial patterns in a multivariate time series of geophys- 
ical or other data). POP analysis has been shown to be reasonably successful in 
forecasting El N ino-Southern Oscillation (ENSO) events up to a year in advance 



(Wu et al 



19941) . In complex PCA, a real-valued time series is analytical ly con- 
tinued into the complex-valued domain by means of a Hilbert transform jliorel 



1984), then the N x N complex correlation matrix is formed via cross-correlation 



of the N independent time series. Thes e methods have been applied extensively 



in the atmospheric and ocean sciences (Penland 


1989 


Burger, 


1993; 


Zhang et al. 


1997; 


Eaaer 


1999 


Kim and North 


1999 


). 



The primary benefit of complex PCA compared to other analysis procedures is that 
it allows propagating features within the time series to be detected and dissected in 



1984). In particular, localized 



terms of their spatial and temporal behavior (Horel, 
propagating phenomena, if they exist, can be easily detected. Classical PCA, for 
example, allows only the detection of standing oscillations. 



Recently it has been shown that the same kinds of real- valued PCA a nalysis can be 



applied to ea rthquak e fault s ystems for forecasting and prediction (Ru ndle et al 



2000b; 



Tia mpo et al. 



2002a.b). It is known that earthquakes recur in complex cy- 



cles, similar to ENSO event s, albeit with the larger earthquake events having sub 



stantially longer time scales ( Scholz, 



1990) than t hose that app. 



y to ENSO-typically 



2004) indicates that earth- 



a decade or less. In addition, theoretical analysis (Klein, 
quake stresses may obey a wave-like equation, having solutions with inverse fre- 
quencies for a given fault similar to those that characterize the time intervals be- 
tween the largest events on the fault. It is of considerable interest to apply complex 
PCA and POP analysis to develop earthquake forecast algorithms, taking account 



of the complex cyclic and quasi-periodic nature of these events. 
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A problem with this approach is that earthquake event time series are typically 
not continuous and differentiable, but instead are point processes, both in space 
and in time. In addition, high quality measurements of earthquakes have only been 
comprehensively observed with instruments for a few decades, so the complete 
(high-density) time series that are available are relatively short compared to the re- 
currence periods for large earthquakes of hundreds of years and longer. The Pattern 
Informatics (PI) method for earthquake forecasting is well suited for these types 
of impulsive time series and performs very well w ith data sets much sh orter than 



2005). As such, 



the recurrence periods for large earthquakes events (Hollid ay et all 
it is an ideal candidate for modification to use complex eigenfunctions and eigen- 
vectors. Assuming that seismic phenomena are analytic, caus ality considerations 
allow us to apply the Cauchy Riemann dispersion relations (lArfken and Webei , 
2001) and analytically continue the measured time series from the real axis into the 
entire upper half -plane of complex space. In this new space we propose to utilize 
the PI method. 



2 Modified Method 



Our modified PI method is based on the idea that the future time evolution o ' 



seismicity can be de scribed by pure phase dynamics dMori and Kuramoto 



Rundleetal 



1998 



2000a b), hence a complex seismic phase function <S(xj, £5, t) is con- 
structed and allowed to rotate in its Hilbert space. This modified representation 
of the input data serves two purposes. First, a complex Hil bert space a llows de- 



1984). This is 



tection both of standing oscillations and traveling waves (Horel, 
important for identifying the quasi-periodic nature of seismicity. Second, the con- 
struction allows for interference between the real and imaginary parts of the phase 
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function. This interference helps correlate geographic locations which are spatially 
separated. 

To create our phase function, the geographic area of interest is partitioned into N 
square bins centered on a point Xj and with an edge length Sx determined by the 
nature of the physical system. For our analysis we chose 5x = 0.1° ~ 11km, 
corresponding to the linear size of a magnitude M ~ 6 earthquake. Within each 
box, a time series ipobsfaiit) is defined by counting how many earthquakes with 
magnitude greater than M min occurred during the time period t to t + St. These 
time series are interpreted as the real-valued portion of an analytic signal, and thus 
the entire signal is recreated by combining ijj obs with its Hilbert transform: 

4> obs (Xi,t) -> tf(Xj,t) = ?pobs+1pobs, (1) 



where V wfx^t) 



specified ( 



Bracewell. 



-JLL 



oo ip(xi,T)dr 

-OO t^T 



and Cauchy principal value integration is 



1999). Next, the activity rate function <S(xj, t b , T) is defined 
as the average rate of occurrence of earthquakes in box % over the period t b to T: 



S(xi,t b ,T) 



T-t b 



(2) 



If t b is held to be a fixed time, S(xi,t b ,T) can be interpreted as the zth com- 
ponent of a general, time-dependent vector evolving in an iV-dimensional space 



( Tiampo et al. , 



2002b). Furthermore, it can be shown that this iV-dimensional cor- 



relatio n space is defined by the eigenvectors of an iVx correlation matrix (Ru ndle et al 




. In order to remove the final free parameter in the system-the choice of 
base year-changes in the activity rate function are then averaged over all possible 
base-time periods: 



S(xi,t ,T) 



T,J b=t0 S(xj,t b ,T) 

T-tn 



(3) 
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The base averaged activity rate function is then normalized by subtracting the spa- 
tial mean over all boxes and scaling to give a unit-norm: 



S(x.i,t ,T) 



t , T) - ± Ef =1 5(x j; t , T) 
Ef=i[5( Xj , t , T) - I Ef=i 5(xfc, to, T)} 



(4) 



The requirement that the rate functions have a constant norm helps remove random 
fluctuations from the system. Following the assumption of pure phase dynamics 



(Rundleetal 



2000a. b), the important changes in seismicity will be given by the 
change in the normalized base averaged activity rate function from the time period 
h to t 2 : 



A5(Xj, to, ti, t 2 ) — <S(Xj, t , t 2 ) — <S(Xj, to, ti 



(5) 



This is simply a pure rotation of the iV-dimensional unit vector S (xj, t , T) through 
time. Finally, the probability of change of activity in a given box is deduced from 
the square of its base averaged, mean normalized change in activity rate: 



P(xi,t ,ti,t 2 ) = [A5(x i ,t ,ti,t 2 )]* x [A5(x i ,t ,ti,t 2 )], 



(6) 



where multiplication and complex conjugation are indicated. In phase dynamical 
systems, probabilities are always rela t ed to the square of the associated vector phase 



function (Mori and Kura moto . 



1998; 



Run dle et al.L l2000b). This probability func- 



tion is often given relative to the background by subtracting off its spatial mean: 



P'(Xj,to,tl,t 2 ) =>- P(Xi,t ,ti,t 2 ) - /I, 



(7) 



Where fi = — Y,f=i ^( x j, to,h, t 2 ) and P' indicates the probability of change in 
activity is measured relative to the background. 
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Fig. 1. Logarithmic seismic hot-spot map for large earthquake events with M > 5 for the 
forecasted time period 1 August 2004 to 3 1 July 2009 using (A) complex eigenvectors and 
(B) real-valued eigenvectors. Data from the SCEDC catalog was used below 35° North 
latitude while data from the NCEDC catalog was used above 35° North latitude. Figure (C) 
is a difference map plotted with a linear scale. 
3 Application Of The Method 

As an application of the modified PI method, we created a short-term forecast seis- 
mic hot-spot map for Southern California over the time period 1 August 2004 to 
31 July 2009. The result is shown in Figure 1A. Also presented in Figure 1 is the 
same forecast map created with real- valued eigenvectors (IB) and a difference map 
between the two methods (1C). 

Two data sets were employed in this analysis, the first being the entire historic seis- 
mic catalog from 1 January 1932 through 31 July 2004, obtained from the Southern 
California Earthquake Data Center (SCEDC) on-line searchable database 1 , with 
all non-local and blast events specifically removed. The relevant data consists of 
location, in East longitude and North latitude, and the date the event occurred. Seis- 
mic events between —122° and —115° longitude and between 32° and 35° latitude 
(any depth and quality) and with magnitude greater than or equal to M min = 3.0 
were selected. Data from the time period 1977-1980 is currently missing from 

1 http ://w w w. data, scec.org/catalog_search/index. html 
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the database but can be found at the older Southern California Seismic Network 
(SCSN) archives 2 . 

The second source of data employed in this analysis was acquired from the North- 
ern California Earthquake Data Center (NCEDC) on-line searchable database 3 , 
with all non-local and blast events again specifically removed. When incorporating 
this catalog, seismic events between —122° and —115° longitude and between 35° 
and 37° latitude (any depth and quality) and with magnitude greater than or equal to 
M min = 3.0 were selected. The necessity for utilizing this additional catalog in our 
analysis arises from various earthquake events in the vicinity of 35° North latitude 
missing from the SCEDC/SCSN catalog but present in the NCEDC collection. 

The necessity of combining catalogs arises from the fact that the SCEDC catalog is 
not complete in its network coverage above the joining mark. Most notably, it does 
not contain earthquakes from the San Simeon region (location of the M = 6.5, 
2003 event). 

As can be seen in Figure 1, the map created using complex eigenvectors is similar 
to the map created using real-valued eigenvectors. Important differences, however, 
are present. Most prominent are the increased emphasis of forecasted activity sur- 
rounding 36° North latitude, —1 17.9° East longitude and the decreased emphasis of 
forecasted activity southwest of the 1999 Hector Mine events. While future moni- 
toring of these areas will be necessary to help determine the accuracy and reliability 
of complex PI analysis, certain measurements can be performed to estimate the in- 
formation gain. 



2 http://www.data.scec.org/ftp/catalogs/SCSN/ 

3 http://quake.geo.berkeley.edu/ncedc/catalog-search.html 
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3. 1 Entropy 



Using methods from information theory (iCover and ThomasLll99ll) . we can calcu- 
late the entropy, H, of our two hot-spot maps. Entropy can be considered a measure 
of disorder (e.g. randomness) or "surprise", hence maps with lower entropy contain 
more useful information than maps with higher entropy. We define entropy as 



N 



H ( z ) = -J2p(*i'> z ) lo £P( x ;; z )> 



(8) 



i=l 



where 



P(-x i ,t ,t 1 ,t 2 ) 




P(-x i ,t ,t 1 ,t 2 ) < z ' 



(9) 



and the probabilities are scaled such that Y$Li p( x i) = 1- This definitions allows a 
measurement of entropy as a function of some lower threshold. 



Performing this calculation on the two maps indicates that the complex PI anal- 
ysis does indeed yield more useful information (lower if -value) than the original 



analysis, but only when the lower threshold is non-zero. Wit h comp 



ex PCA cal- 



1984). Since 



culations, sudden transitions and noisy spikes are emphasized (Horel, 
seismic time series data can be approximated by chains of delta functions, we ex- 
pect that calculations in the complex domain would contain more low-level noise. 
A small, non-zero threshold allows us to measure the entropy above and relative to 
this noise. 
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3.2 ROC Analysis 



A second measure for the accuracy of the hot-spot maps can be inferred from rela- 
tive operating characteristic (ROC) diagrams. ROC curves are essentially signal de- 
tection curves for binary forecasts obtained by plotting the hit rate (v-axis) against 
the fa lse alarm rate (x-axis) over a range of different thresholds ( Joliffee and Stephenson, 



2003). Originally established for verifying tornado forecasts (|Murphy and Winklei , 



19871) . ROC framework s have recently become popular in the seismic community 



as well (Molchan, 



1997) 



While only one year has passed since the onset of the hot-spot forecasts given in 
Figure 1, we can create ROC diagrams for the two maps by considering a "hit" 
to be any box i with P(x;, t , t 1; t 2 ) > z, for some threshold z, that contains a 
future large earthquake. Similarly we consider a "false alarm" to be any box j 
with P(x-j, t , t\, t 2 ) > z, for some threshold z, that does not contain a future large 
earthquake. Since a successful forecast will maximize the hit rate while minimizing 
the false alarm rate, a measure of the forecast accuracy is given by the area, A RO c 
under the ROC curve. It can be shown that A RO c 1 for a perfect forecast and 
Aroc — * 0.5 for a forecast consisting of randomly distributed alarms. 



Performing this calculation on the two maps again indicates that the complex PI 
analysis is better correlated with future large events (higher A ROC -ya\ue) than the 
original analysis. It is important to consider, however, that this analysis is only 
using one year of observed future seismicity. A full analysis should be performed 
at the end of the forecast interval. 
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4 Conclusion 



Complex PCA is a useful tool and is ideally suited for many applications. There 
are, however, situations where the results of complex PCA are difficult to interpret 
such as when both amplitude and phase relationships must be considered. For these 
types of systems, the existence of phase information by itself suggests the need for 
an analysis in the full complex domain. 

The theoretical evidence that earthquake stress fields are wave-like in nature indi- 
cates that seismicity is better studied using complex time series. Due to its ability to 
create seismic hot-spot forecast maps using relatively short time series data and its 
handling of impulsive data sets, the PI method is naturally extended to this complex 
domain. 

In our five year seismic hot-spot forecast for southern California, the map created 
using complex eigenvectors has subtle differences with the the map created using 
real-valued eigenvectors. These differences result in more useful information (i.e. 
a reduction in the map entropy) and in better apparent correlation with future large 
earthquakes. Future monitoring and testing, however, will be necessary to conclu- 
sively determine the accuracy and reliability of complex PI analysis. 
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